Genome wide association study of clinical duration and age at onset of sporadic CJD

Human prion diseases are rare, transmissible and often rapidly progressive dementias. The most common type, sporadic Creutzfeldt-Jakob disease (sCJD), is highly variable in clinical duration and age at onset. Genetic determinants of late onset or slower progression might suggest new targets for research and therapeutics. We assembled and array genotyped sCJD cases diagnosed in life or at autopsy. Clinical duration (median:4, interquartile range (IQR):2.5–9 (months)) was available in 3,773 and age at onset (median:67, IQR:61–73 (years)) in 3,767 cases. Phenotypes were successfully transformed to approximate normal distributions allowing genome-wide analysis without statistical inflation. 53 SNPs achieved genome-wide significance for the clinical duration phenotype; all of which were located at chromosome 20 (top SNP rs1799990, pvalue = 3.45x10-36, beta = 0.34 for an additive model; rs1799990, pvalue = 9.92x10-67, beta = 0.84 for a heterozygous model). Fine mapping, conditional and expression analysis suggests that the well-known non-synonymous variant at codon 129 is the obvious outstanding genome-wide determinant of clinical duration. Pathway analysis and suggestive loci are described. No genome-wide significant SNP determinants of age at onset were found, but the HS6ST3 gene was significant (pvalue = 1.93 x 10−6) in a gene-based test. We found no evidence of genome-wide genetic correlation between case-control (disease risk factors) and case-only (determinants of phenotypes) studies. Relative to other common genetic variants, PRNP codon 129 is by far the outstanding modifier of CJD survival suggesting only modest or rare variant effects at other genetic loci.


Introduction
Human prion diseases are rare and often rapidly progressive dementia disorders with no known treatments that slow the disease process.The most common type, sporadic Creutzfeldt-Jakob disease (sCJD), occurs at a relatively uniform annual incidence of 1-2/million, equating to a lifetime risk of approximately 1:5000 [1].The clinical presentation and progression of the disorder is remarkably variable both in terms of the initial symptoms and signs, age at onset and clinical duration [2][3][4].Patients typically present in late middle or old age but have been reported in adolescence and early adulthood, and at the extremes of old age [5][6][7].The median clinical duration is usually reported as five months with a range of only a few weeks to several years [2].Ability to estimate the likely clinical duration could help with timely decisions about care [8].
Prions are proteinaceous pathogens formed of host prion protein (PrP) which cause mammalian prion diseases like bovine spongiform encephalopathy, sheep scrapie, chronic wasting disease of cervids, and the human disorders [9].The recently determined structures of mouse and hamster prions reveals assemblies of PrP in a parallel in-register beta sheet structure with two domains [10,11], in marked contrast to the predominant alpha-helices of normal cellular PrP [12].Prions are thought to replicate by a process of binding of normal cellular PrP, conformational change and subsequently aggregate fission.In several model systems, incubation time of prion disease is influenced by PrP gene expression, primary sequence and polymorphisms, as well as prion strains [13], thought to be conferred by structural variation of the pathogen [14].Experiments using animal or cellular model systems have led to proposals of Funding: This work was supported by the MRC (UK) core grant to the MRC Prion Unit at UCL (code MC_UU_00024/1).Several authors at UCL/ UCLH receive funding from the Department of Health's NIHR Biomedical Research Centres funding scheme.Some of this work was supported by the Department of Health funded National Prion Monitoring Cohort study.Funding for the collection of Polish samples for study was partially provided by the EU joint programme JPND and Medical University of Lodz.The Italian national surveillance of Creutzfeldt-Jakob disease and related disorders is partially supported by the Ministero della Salute, Italy.The German National Reference Centre for TSE is funded by grants from the Robert-Koch-Institute.several possible non-PrP mechanisms of toxicity in prion diseases, involving PrP binding partners on the cell surface and downstream intracellular changes [15][16][17]; however, their relevance to the human diseases is yet to be determined.
Human epidemiological and genetic studies have identified factors that associate with survival time in sCJD [2,8,18,19], including demographic factors, prion protein genotype, molecular strain typing of protease-resistant prion protein by Western blot analysis, and a range of biofluid, tissue, imaging, and neurophysiological biomarkers [20].Many biomarkers simply measure the rate or extent of neuronal injury, loss, or dysfunction, or immune cell or glial responses, whereas genetic associations are implicitly causal of modified clinical phenotypes.In this study, we sought to determine the effects of genome-wide common genetic variation on key clinical phenotypes of sCJD, to develop evidence of modifiers relevant to human prion diseases that might benefit understanding of disease processes and generate new ideas for therapeutics.

Diagnosis and clinical phenotypes
Details of the contributing sites and diagnostic criteria were given in a previous publication [19].In short, all patient participants were deceased and gained a diagnosis in life of probable CJD or definite CJD after a post-mortem examination (using contemporary epidemiological criteria which changed over the recruitment period 1990-2019)."Probable CJD" is an epidemiological term that now equates to an almost certain diagnosis of CJD post-mortem (e.g.[21]).Age at clinical onset was given to the nearest month.Clinical duration was based on the examining physician's impression of the date of onset of the first symptom that subsequently was thought to be a component of the disease syndrome until death in months.
Samples used in this study were obtained over several decades and the data were accessed from January 2023 until now.

Genotyping and quality control
In addition to 4110 samples previously reported, genotyped on an Illumina OmniExpress array [19], 819 new samples were genotyped using Illumina's Global Screening Array.Standard sample and genotyping quality control was performed using PLINK v1.90b3v, which generated 6,308,901 autosomal SNPs of high quality.Samples with a call rate below 98% and population outliers identified via multidimensional scaling were removed.Additionally, related samples (Pi_Hat > 0.1875) were discarded.Only autosomal SNPs with a genotyping rate of >99%, a minor allele frequency � 0.01 and SNPs not deviating from the Hardy-Weinberg equilibrium (P>10 −4 ) were retained.SNPs of A/T or G/C transversion or those which showed deviation from heterozygosity mean (±3 SD) were excluded.To ensure consistency with the Michigan Imputation Server pipeline the target VCF files were checked against the 1000 Genomes Project reference panel (https://faculty.washington.edu/browning/conform-gt.html/).Genotypes were imputed using the Michigan Imputation Server (using Minimac4 assuming a mixed population, HRC r1.1 2016 (Haplotype Reference Consortium) as reference panel and Eagle 2.4 for phasing) [22].A post-imputation QC analysis was carried out and SNPs with an r 2 threshold lower than 0.3 (removing 70% of poorly imputed SNPs) were excluded.

Statistical analysis
SNPTEST (v2.5.2) was used to perform association and conditional analysis with an additive and heterozygous logistic regression model, using sex, contributing site and 10 population covariates generated with PLINK (v1.90b3v; www.cog-genomics.org/plink/1.9/).Genetic correlation between this (using duration as phenotype) and the previously conducted sCJD casecontrol study [19] was performed using LDSC [23], a software tool for linkage disequilibrium (LD) score and heritability estimation using summary statistics.Meta-analysis was performed using METAL combining the previously published GWAS case-control data [19] and the caseonly data described here using summary test statistics as input (6,314,883 SNPs in the union list) and adopting the sample-based approach by combining z-scores across samples in a weighted sum proportional to study sample sizes.FUMA [24], using an integrated Magma gene-based and gene-set analysis on the GWAS summary data, was utilised to perform pathway analysis to identify genes and pathways associated with sCJD risk.FUMA also provides information about chromatin interaction, expression patterns and shared molecular functions between genes.MAGMA software was also utilised for gene-based / gene-set analysis [25].Power analysis was performed using R functions taken from the Github site https://github.com/kaustubhad/gwas-power provided by Kaustubh Adhikari (UCL Division of Biosciences, University College London).

Ethics
The research project has approval from the NHS Health Research authority (London-Harrow Research Ethics Committee, London, UK); the REC reference is 05/Q0505/113.Written informed consent has been obtained.

Results
We performed the association analysis with 3773 (duration as phenotype; median:4.0,IQR:2.5-9(months)) and 3767 (age at onset as phenotype; median:67, IQR:61-73 (years)).cases of probable or definite sCJD by contemporary diagnostic criteria either included in a previous paper from the collaborative group [19], or newly genotyped on Illumina's Global Screening Array (Table 1).All patients were deceased.Genotype doses were imputed using the Michigan Imputation Server [22], resulting in 6,308,901 SNPs passing quality control.
The median age / duration for men was 67 years and 3.8 months respectively and 67 years and 4.0 months for women.Median clinical duration (2.0-6.0 months) and age at onset (63.5-72 years) varied by site, so this was included as a covariate in the analysis.Phenotypes were modelled as normally distributed quantitative traits following transformation using methods developed by Box and Cox [26] 2 and 3).
Age-based analysis did not identify any genome-wide significant SNP associations (Fig 9).Two suggestive associations were identified on chromosome 15 near NEDD4 and chromosome 13 near UGGT2 (S8 and S9 Figs).Gene-based analysis for age at onset with MAGMA identified HS6ST3 (pvalue = 1.93 x 10 −6 ), with similarly significant association detected using FUMA (S3 and S4 Tables).Gene-set analysis for clinical duration using FUMA (including PRNP locus) identified binders of type-5 metabotropic glutamate receptors (GO Molecular Function ontology n = 1738, pvalue = 1.85 x 10 −5 ) (Tables 4 and 5).Gene-set analysis for age at onset using MAGMA revealed intracellular oxygen homeostasis as a significant term (pvalue = 1.89 x 10 −6 ) (S5 Table ).Genetic correlation between clinical duration GWAS and the previously published case-control GWAS resulted in a non-significant genetic correlation of 0.1467 (pvalue = 0.79, 95% CI 0.92,1.21;S6 Table ).Meta-analysis of the two GWAS (case-only and casecontrol) resulted in the same strong codon 129 effect as described above whilst removing the suggestive locus on chromosome 22 the HDHD5 locus (S10 Fig).
We also calculated the power of the study based on 3773 samples and a genome-wide significance level of 5x10 -8 using the additive model with a range of effect sizes and minor allele frequencies.Plotting the most significant SNP (PRNP; rs1799990) and the lead SNPs of the suggestive association signals (HDHD5, rs4819962; FHIT, rs2366847; EREG, rs11727991) resulted in rs1799990 achieving full power and the three lead SNPs being borderline achieving a power value of ~0.7-0.8 (S11 Fig) .Interestingly, there was no evidence that the sCJD genetic susceptibility genes, STX6 or GAL3ST1, which were identified in the previously published case-control study [19], modify clinical phenotypes.The identification of these genes in the case-control GWAS implicated intracellular trafficking and sphingolipid metabolism respectively as causal disease mechanisms.To further investigate the roles of these pathways in disease phenotypes, we compiled a comprehensive, bespoke gene list including genes related to these pathways, which have been implicated in neurodegenerative diseases, and performed MAGMA analysis (S7 and S8 Tables).This highlighted UGGT2, a sphingolipid metabolism linked gene, to be associated with sCJD age of onset.

Discussion
We describe the first well-powered GWAS for phenotypic traits in sporadic human prion disease.The only clearly identified risk locus was the PRNP gene itself, more specifically the wellknown common variant at codon 129, for the clinical duration phenotype.Conditioning for the codon 129 polymorphism at this locus removed all evidence of association at the locus, implicating the coding sequence of PRNP and not PrP expression in controlling this phenotype.We found a number of suggestive risk loci with P<10 −5 , which should require additional genetic evidence before being considered further.Pathway analysis identified binders of type-5 metabotropic glutamate receptors, which are known to mediate the downstream effects of amyloid beta bound to prion protein, as a top hit for clinical duration [27,28].Importantly however, since this small gene set (n = 5) was non-significant after removing PRNP, these data For age at onset there were no genome-wide significant SNPs, but we identified the HS6ST3 in a gene-based test and intracellular oxygen homeostasis by pathway analysis (S3-S5 Tables).HS6ST3 or Heparan Sulfate 6-O-Sulfotransferase 3 catalyses the transfer of sulfate from 3'phosphoadenosine 5'-phosphosulfate (PAPS) to position 6 of the N-sulfoglucosamine residue (GlcNS) of heparan sulfate (HS), thus potentially modifying the interactions of this molecule with cell surface proteins.There is a vast literature on a role for polyanionic compounds, including HS in prion disease pathogenesis, as they colocalise with PrP C on the cell surface and with aggregated PrP Sc [29], act as potential co-factors in prion replication, and there is potent inhibitory activity of HS and related compounds on prion propagation [30].A role for intracellular oxygen homeostasis is less clearly linked to prion disease.Both associations were borderline in significance taking into account multiple testing.We found no evidence of genetic correlation between the case-only and published case-control GWAS analyses.We observed only a moderate heritability (h 2 SNP = 0.18-0�26, using different methods) for the case-control GWAS [19], and low heritability for the duration phenotype (h 2 SNP = 0.09 using LDSC).Common SNPs measured in these studies therefore explain only a small proportion of disease phenotypes.The only locus common to both GWAS studies is PRNP, with no evidence that SNPs at the STX6 or GAL3ST1 loci have any effect on clinical phenotypes in lead SNP association, gene-based or pathway analyses.It is possible that larger sample sizes, with additional risk factor discovery, will uncover shared determinants, but the current evidence suggests that beyond PRNP, distinct mechanisms and/or stochasticity determines disease risk, age at onset and clinical duration.
Absence of an association between PRNP cis-eQTL SNPs and clinical duration/age of onset should not deter the pursuit of methods to reduce PrP as a therapeutic strategy.There is a wealth of evidence for the safety and potential effectiveness of this approach from animal models [31][32][33][34][35]. PRNP cis-eQTL SNPs are predominantly associated with localised tissue expression of PrP, typically in cerebellum or cerebellar hemispheres, and are relatively modest effects.Therapeutic strategies aim for more profound protein knock-down, which will be critical to achieve across a wide range of central nervous system tissues and cell types [36].
Poleggi et al. (2018) [37] aimed to identify additional genetic modifiers in a GWAS study with a small cohort of patients (E200K mutation only).In this study, two SNPs were identified within the CYP4X1 gene locus indicating that this gene modulates onset of disease in sCJD.
The top SNP identified in the Poleggi analysis (rs9793471) had a pvalue of 0.08 in our analysis.
A number of GWAS studies reporting genetic modifiers in other neurological diseases of in relation to the age at onset phenotype have been reported.One example is the case-only study of Li et al. [38] where a number of novel genes for age-at onset in Alzheimer's disease were identified.Blauwendraat et al. [39] described several modifier loci in an age-at-onset GWAS analysis of Parkinson's disease.
It was imperative to transform the non-normal distribution of the duration phenotype data as the GWAS association model requires Gaussian distributed phenotype data to avoid model misspecification, which could lead to false conclusions.A number of data transformations were tested (log, rank inverse, square root) for transformation of the phenotype data (duration and age) and the Box-Cox transformation was found to be the best option for establishing the optimal correlation coefficient ensuring a normal distribution and reduction of data noise to a minimum.
This study was limited by sample size and was restricted to the examination of age at onset and clinical duration phenotypes that are almost universally collected, whereas the diversity of clinical phenotypes in CJD is well known (including variable involvement of cognitive, ataxic, psychiatric, sleep and motor aspects).In biochemical aspects and biomarkers, we see diversity of PrP Sc types, and different imaging, neurophysiological and fluid biomarker associations.These parameters are only collected in smaller subsets of data.Genetic studies in a rare disease like sCJD benefit from national investment and collaboration in prion disease surveillance [40].Future work of the collaborative group might focus on building larger sample collections for increased power, exome or genome studies to ascertain rare and structural variants and extension of these type of analyses to other phenotypes (e.g., the well-known subtypes of CJD based on major symptom at presentation (ataxia, visual processing disorder etc.)).
illustrated as histograms and QQ plots (Figs 1 and 2; S1 Fig).Association analysis omitting sex, age or country or any combination as covariates did not show any significant difference in terms of outcome.Principal components analysis was used to exclude cases with distinct ancestry (n = 54) and did not suggest any strong effects of ancestry on the outcomes of interest (S2 and S3 Figs).Additive and heterozygous genetic models were run genome-wide in SNPTEST with sex, contributing site and genetic ancestry covariates (see Methods) without any statistical inflation (lambda = 1.000 / 1.000 for clinical duration / age) as illustrated with QQ plots in Fig 3 (duration phenotype) and Fig 4 (age phenotype).53 SNPs achieved genome-wide significance (P<5x10 -8 ) for the clinical duration phenotype (additive model) (Fig 5 and S1 Table), all at the PRNP locus (top SNP rs1799990, pvalue = 3.45x10 -36 , beta = 0.34 for additive model;

1 (
of T-lymphocyte activation via T cell receptor contact with MHC-bound antigen 5 via T cell receptor contact with antigen bound to MHC molecule on antigen presenting , CD25-positive, alpha-beta regulatory T cell differentiation 4 regulation of T cell activation via T cell receptor contact with antigen bound to MHC molecule on antigen presenting cell 2 NGENES = number of genes in the gene-set dataset; BETA = regression coefficient of the gene set) https://doi.org/10.1371/journal.pone.0304528.t004 The Dutch National Prion Disease Registry is funded by the National Institute for Public Health and the Environment (RIVM), which is part from the Ministry for Health, Welfare and Sports, The Netherlands.PS-J was supported by Instituto de Salud Carlos III [Fondo de Investigacio ´n Sanitaria, PI16/01652] Accion Estrategica en Salud integrated in the Spanish National I+D+i Plan and financed by Instituto de Salud Carlos III (ISCIII) -Subdireccion General de Evaluacion and the Fondo Europeo de Desarrollo grants from Medical Research Council (UK) and grants from National Institute of Health Research's Biomedical Research Centre at University College London Hospitals NHS Foundation Trust during the conduct of the study.Gabor G Kovacs reports personal fees from Biogen, outside the submitted work.John Collinge reports grants from Medical Research Council, grants from NIHR UCLH Biomedical Research Centre, during the conduct of the study; and is a Director and shareholder of D-Gen Limited, an academic spinout in the field of prion disease diagnostics, decontamination and therapeutics.Inga Zerr reports grants from the Bundesministerium fu ¨r Gesundheit via Robert Koch institute, JPND and personal fees (not related to the content of the manuscript) from Ferring Pharmaceuticals and IONIS, speaking honoraria for medical lectures from Lilly, Biogen, Medfora, DGLN (German Society for cerebrospinal fluid diagnostics in Neurology).Maurizio Pocchiari reports personal fees from Ferring Pharmaceuticals, personal fees from CNCCS (Collection of National Chemical Compounds and Screening Center), non-financial support from Fondazione Cellule Staminali, outside the submitted work.Michael D Geschwind has consulted for3D Communications, Adept Field Consulting, Advanced Medical Inc., Best Doctors Inc., Second Opinion Inc., Gerson Lehrman Group Inc., Guidepoint Global LLC, InThought Consulting Inc., Market Plus, Trinity Partners LLC, Biohaven Pharmaceuticals, Quest Diagnostics and various medical-legal consulting.He has received speaking honoraria for various medical center lectures and from Oakstone publishing.He has received past research support from Alliance Biosecure, CurePSP, the Tau Consortium, and Quest Diagnostics.Michael D Geschwind serves on the board of directors for San Francisco Bay Area Physicians for Social Responsibility and on the editorial board of Dementia & Neuropsychologia.